An Analytical Study of a Structured Overlay in the 
Presence of Dynamic Membership 

Supriya Krishnamurthy 1,3 , Sameh El-Ansary 1 , Erik Aurell 1 ' 2 and Seif Haridi 1 ' 3 
1 Swedish Institute of Computer Science (SICS), Sweden 
2 Department of Physics, KTH-Royal Institute of Technology, Sweden 
3 IMIT, KTH-Royal Institute of Technology, Sweden 
{supriya,sameh,eaurell,seif}@sics.se 



Abstract — In this paper we present an analytical study of 
dynamic membership (aka churn) in structured peer-to-peer 
networks. We use a fluid model approach to describe steady- 
state or transient phenomena, and apply it to the Chord system. 
For any rate of churn and stabilization rates, and any system 
size, we accurately account for the functional form of the 
probability of network disconnection as well as the fraction of 
failed or incorrect successor and finger pointers. We show how 
we can use these quantities to predict both the performance and 
consistency of lookups under churn. All theoretical predictions 
match simulation results. The analysis includes both features 
that are generic to structured overlays deploying a ring as well 
as Chord-specific details, and opens the door to a systematic 
comparative analysis of, at least, ring-based structured overlay 
systems under churn. 

I. Introduction 

AN intrinsic property of Peer-to-Peer systems is the pro- 
cess of never-ceasing dynamic membership. Structured 
Peer-to-Peer Networks (aka Distributed Hash Tables (DHTs)) 
have the underlying principle of arranging nodes in an over- 
lay graph of known topology and diameter. This knowledge 
results in the provision of performance guarantees. However, 
dynamic membership continuously "corrupts/churns" the over- 
lay graph and every DHT strives to provide a technique to 
"correct/maintain" the graph in the face of this perturbation. 

Both theoretical and empirical studies have been conducted 
to analyze the performance of DHTs undergoing "churn" and 
simultaneously performing "maintenance". Liben-Nowell et 
al. [11] prove a lower bound on the maintenance rate required 
for a network to remain connected in the face of a given 
dynamic membership rate. Aspnes et al. [3] give upper and 
lower bounds on the number of messages needed to locate 
a node/data item in a DHT in the presence of node or link 
failures. The value of such theoretical studies is that they 
provide insights neutral to the details of any particular DHT. 
Empirical studies have also been conducted to complement 
these theoretical studies by showing how within the asymptotic 
bounds, the performance of a DHT may vary substantially 
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depending on different DHT designs and implementation 
decisions. Examples include the work of: Li et al. [10], Rhea 
et al. [14], and Rowstron et al. [5]. 

In this paper, we present a fluid model of Chord [15], a 
specific DHT, under churn. Fluid models have been used to 
model data communication systems at least since the early 
'80ies [2], and in some sense since the work of Erlang [4]. 
More recently, in the context of P2P systems, it has been 
used to model the performance of BitTorrent [13] and the 
Squirrel caching system [6]. This technique has much in 
common with macroscopic and mesoscopic descriptions of 
physical and chemical phenomena (from where the term fluid 
has obviously been borrowed), and carries the same advantages 
of conciseness and computability relative to an underlying 
more exact description. Our analysis is directly based on the 
master equation approach of physical kinetics, see e.g. the text 
book [12], which provides a scheme for taking the various 
dynamical processes involved systematically into account. 

The fluid model requires the notion of a state of the system. 
This is just a listing of the quantities one would need to know 
for a description of the system at a given level of detail. 
For Chord, we use grosso modo a level of description which 
requires keeping track of how many nodes there are in the 
system and what the state (whether correct, incorrect or failed) 
of each of the pointers of those nodes is. This information is 
not enough to draw a unique graph of network-connections 
because, for example, if we know that a given node has an 
'incorrect' successor pointer, this still does not tell us which 
node it is pointing to. However, as we will see, beginning at 
this level of description is sufficient to keep track of most of 
the details of the Chord protocols. Having defined a state, the 
fluid model is simply a set of equations for the evolution of the 
probability of finding the system in this state, given the details 
of the dynamics. The master equation approach is useful for 
keeping track of the contribution of all the events which can 
bring about changes in the probability in a micro-instant of 
time i.e., evaluating all the terms in the dynamics leading to 
a gain or loss of this probability. 

Using this formalism we investigate a probabilistic model 
in which peers arrive independently, distributed as a Poisson 
process, and life-times are exponentially distributed. While this 
setup is not necessary fully realistic (more realistic models 
can also be analyzed using master equation techniques), it is 
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standard in modeling, as it typically brings out the salient 
features of the system with as few obscuring details from 
the probabilistic model as possible. We then derive the func- 
tional forms of the following: (i) Chord-specific inter-node 
distribution properties and (ii) for every outgoing pointer of a 
Chord node, the probability that it is in any one of its possible 
states. This probability is different for each of the successor 
and finger pointers. We then use this information to predict 
other quantities such as (Hi) the probability that the network 
gets disconnected, (iv) lookup consistency (number of failed 
lookups), and (v) lookup performance (latency). All quantities 
are computed as a function of the parameters involved and all 
results are verified by simulations. 

II. Related Work 

Closest in spirit to our work is the informal derivation in the 
original Chord paper [15] of the average number of timeouts 
encountered by a lookup. This quantity was approximated 
there by the product of the average number of fingers used 
in a lookup times the probability that a given finger points 
to a departed node. Our methodology not only allows us to 
derive the latter quantity systematically but also demonstrates 
how this probability depends on which finger (or successor) is 
involved. Further we are able to derive a precise relation re- 
lating this probability to lookup performance and consistency 
accurately at any value of the system parameters. 

In the works of Aberer et al. [1] and Wang et al. [16], 
DHTs are analyzed under churn and the results are compared 
with simulations. These analyses can also be classified as fluid 
models. However the main parameter is the probability that 
a random selected entry of a routing table is stale. In our 
analysis, we determine this quantity from system details and 
churn rates. 

A brief announcement of the results presented in this paper, 
has appeared earlier in [8]. 

III. Our Implementation of Chord 

The Chord Ring. The general philosophy of DHTs is to 
map a set of data items onto a set of nodes where the insertion 
and lookup of items is done using the unique keys that the 
items are given. Chord's realization of that philosophy is as 
follows. Peers and data items are given unique keys (usually 
obtained by a cryptographic hash of unique attribute like the 
IP address or public key for nodes, and filename or checksum 
for items) drawn from a circular key space of size JC. The 
Chord system dictates that the right place for storing an item 
is at the first alive node whose key succeeds the key of the 
item. Since we refer to nodes and items by their keys, the 
insertion and lookup of items becomes a matter of locating 
the right "successor" of a key. All nodes have successor and 
predecessor pointers. For N nodes, using only the successor 
pointers to lookup items requires iN hops on average. 

Fingers. To reduce the average lookup path length, nodes 
keep A4 = log 2 JC pointers known as the "fingers". Using 
these fingers, a node can retrieve any key in OQo'gN) hops. 
The fingers of a node n (where n G ■ • • JC — 1) point to 
exponentially increasing distances of keys away from n. That 



is, \fi G 1..M, n points to a node whose key is equal to 
n + 2 l ~ 1 . We denote that key by n.firii. start. However, for a 
certain i, there might not be a node in the network whose key 
is equal to n + 2 l_1 . Therefore, n points to the first successor 
of n + 2 l_1 which we denote by n.firii.node. 

The Successor List Moreover, each node keeps a list of 
the S = 0(\og(N)) immediate successors as backups for its 
first successor. We use the notation n.s to refer to this list and 
n.Si to refer to the i th element in the list. Finally we use the 
notation n.p to refer to the predecessor. 

Stabilization, Churn & Steady State. To keep the pointers 
up-to-date in the presence of churn, each node performs 
periodic stabilization of its successors and fingers. In our 
analysis, we define Xj as the rate of joins per node, A/ the rate 
of failures per node and X s the rate of stabilizations per node. 
The fraction of stabilizations which act on the successors is a, 
such that the rate of successor stabilizations is a\ s , and the 
rate of finger stabilizations is (1 — a)X s . In all that follows, we 
impose the steady state condition A_, = A/ unless otherwise 
stated. Further it is useful to define r = which is the 
relevant ratio on which all the quantities we are interested in 
will depend, e.g, r — 50 means that a join/fail event takes 
place every half an hour for a stabilization which takes place 
once every 36 seconds. Throughout the paper we will use the 
terms XjNAt, X f NAt, aX s NAt and (l-a)X s NAt to denote 
the respective probabilities that a join, failure, a successor 
stabilization, or a finger stabilization take place anywhere on 
the ring during a micro period of time of length At. 

Parameters. The parameters of the problem are hence: JC, 
N, a and r. All relevant measurable quantities should be 
entirely expressible in terms of these parameters. 

Simulation Since we are collecting statistics like the prob- 
ability of a particular finger pointer to be wrong, we need 
to repeat each experiment 100 times before obtaining well- 
averaged results. The total simulation sequential real time 
for obtaining the results of this paper was about 1800 hours 
that was parallelized on a cluster of 14 nodes where we 
had N = 1000, JC = 2 20 , S = 6, 200 < r < 2000 and 
0.25 < a < 0.75. 

While the main outlines of the chord protocol are provided 
by its authors in [15], an exact analysis necessitates the 
provision of a deeper level of detail and adopted assumptions 
which we provide in the following subsections. 

A. Joins, Failures & Ring Stabilization 

Initialization. Initially, a node knows its key and at least 
one node with key c that already exists in the network and 
is alive. The knowledge of such a node is assumed to be ac- 
quired through some out-of-band method. The predecessor p, 
successors (si..s) and fingers (/mi.x.node) are all assigned 
to nil. 

Joins (Fig. [T). A new node n joins by looking up its 
successor using the initial random contact node c. It also starts 
its first stabilization of the successors and initializes its fingers. 

Stabilization of Successors (Fig. []])■ The function fixSuc- 
cessors is triggered periodically with rate aX s . A node n 
tells its first alive successor y that it believes itself to be y's 
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n.join(c) 

si = cfindSuccessor(n) 
fixSuccessors( ) 
initFingers(s\) 
n.fixSuccessorsf ) 

y = firstAliveSuccessor() 
{y.p,y.s} = y.iThmklamYourPred(n) 
if (y.p £ (me, y)) //Case A 
prepend(y.p) 
fixSuccessorsf ) 
elsif (y.p £ (y, me)) //Case B 
considerANewPredf y.p) 
reconcilce(y.s) 
else //Case C: y.p == me 
reconcile(y.s) 

n.firstAliveSuccessorf ) 

while (true) 

if (si == nil) 

//Broken Ring!! 
if (isAlive(si)) 

return (s\) 

Mi e i..(S-i) 

Si = Sj + 1 

sg = nil 

n.considerANewPred( x ) 

if (isNotAIive(p) 
or (p == nil) 
or (x £ (p, n))) 
p = x 



n.iThinklAm YourPred( x ) 

if ((isNotAlive(p) or (p == nil)) 
p = x 

return( {s, x}) 
if(x£ (p,me)) 
oldp = p 
p = x 

return( {s, oldp}) 
else 

return( {s, p}) 

n.nrepend(y) 

n.reconcilefs ) 

for i = l..(S - I) 

Si + l = s< 



for i = S..2 



si 



Fig. 1 

Joins and Ring Stabilization Algorithms. 



predecessor and expects as an answer y's predecessor y.p and 
successors y.s. The response of y can lead to three actions: 
Case A. Some node exists between n and y (i.e., n's belief 
is wrong), so n prepends y.p to its successor list as a first 
successor and retries fixSuccessors. 

Case B. y confirms n's belief and informs n of y's old prede- 
cessor y.p. Therefore n considers y.p as an alternative/initial 
predecessor for n. Finally, n reconciles its successor list with 
y.s. 

Case C. y agrees that n is its predecessor and the only task 
of n is to update its successor list by reconciling it with y.s. 

By calling iThinklamYourPred (Fig. [TJ, some node x in- 
forms ?7 that it believes itself to be n's predecessor. If n's 
predecessor p is not alive or nil, then n accepts i as a 
predecessor and informs x about this agreement by returning 
x. Alternatively, if n's predecessor p is alive (discovering that 
will be explained shortly in section ITlI-CI ). then there are two 
possibilities: The first is that x is in the region between n and 
its current predecessor p, therefore n should accept s as a 
new predecessor and inform x about its old predecessor. The 
second is that p is already pointing to x so the state is correct 
at both parties and n confirms that to x by informing it that 
x is the predecessor of n. In all cases the function returns a 
predecessor and a successor list. 

The function JirstAliveSuccessor (Fig. [U iterates through 
the successor list. In each iteration, if the first successor si is 
alive, it is returned. Otherwise, the dead successor is dropped 
from the list and nil is appended to the end of the list. If the 
first successor is nil this means that all immediate successors 
are dead and that the ring is disconnected. 



n.initFingers(s\) 

r = sx.f 

Vi e 1..M s.th. (fini. start £ (n,s\\), 

fini.node = s\ 
Vj £ 1..M s.th. (finj. start £ (n, sj}), 

f in j. node =localSuccessor(f' , finj .start) 
n.localSuccessor( f, k ) 
for i = 1..M 

if (k £ (n,fim]) 
return(fini) 
return(nil) 



n.fixFingers( k ) 

1 < i = random() < M 
fini.node 

findSuccessor( fini. start) 



Fig. 2 

Initialization and Stabilization of Fingers. 



B. Lookups and Stabilization of Fingers 

Stabilization of Fingers (Fig. |2|. Stabilization of fingers 
occurs at a rate (1 — a)X s . Each time the fix Fingers function 
is triggered, a random finger firii is chosen and a lookup 
for firii. start is performed and the result is used to update 
fini.node. 



n.findSuccessor( k ) 

//Case A: k is exactly equal to n 
if (k == n) 

return(n) 
//Case B: k is between n and s\ 
if(k £ (n,si]J 

return(firstAliveSuccessorNoChange()); 
//Case C: Forward to the lookup to 
//the closest preceding alive finger 
cpf = closestAlivePrecedingFinger(k); 
if(cpf == nil) 

y = firstAliveSuccessorNoChangeO; 

if(k £ (n,y}> 
return(y); 

cpf = closestAlivePrecedingSucc(k); 

return( cpf .findSuccessor(k) ) 
else 

return (cpf.findSuccessor(k)); 

n.firstAliveSuccessorNoChange() 

i = 1 

while (true) 

if (si == nil) 

//Broken Ring!! 
if ( isAlive( s j )) 
return fsj 



i + + 

n.closestAlivePrecedingFinger(k) 
for i = M..1 

if ((firii £ {n,k)) 
and (fini 7^ nil) 
and isAlive(fini)) 
return( fini ) 

return(nil) 



n.closestAlivePrecedingSucc( k ) 
for i = 1S..I 

if ((Si £ (n,k)) 
and (Si ^ nil) 
and isAlive(si)) 
return(Si) 

return(cpf) 



Fig. 3 

The Lookup Algorithm. 



Initialization of Fingers (Fig. O. After having initialized its 
first successor Si, a node 77 sets all fingers with starts between 
77 and si to si. The rest of the fingers are initialized by taking 
a copy of the finger table of s± and finding an approximate 
successor to every finger from that finger table. 

Lookups (Fig. |3J. A lookup operation is a fundamental 
operation that is used to find the successor of a key. It is used 
by many other routines and its performance and consistency 
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are the main quantities of interest in the evaluation of any 
DHT. A node n looking up the successor of k runs the 
findSuccessor algorithm which can lead to the following cases: 
Case A. If k is equal to n then n is trivially the successor 
of k. 

Case B. If k G (n, s±] then n has found the successor of k, 
but it could be that si has failed and n has not yet discovered 
this. However, entries in the successor list can act as backups 
for the first successor. Therefore, the first alive successor of 
n is the successor of k. Note that, in this case, while we try 
to find the first alive successor, we do not change the entries 
in the successor list. This is mainly because, to simplify the 
analysis, we want the successor list to be changed at a fixed 
rate rate aX s only by the fixSuccessors function. 

Case C. The lookup should be forwarded to a node closer 
to k, namely the closest alive finger preceding k in n's finger 
table. The call to the function closestAlivePrecedingFinger 
returns such a node if possible and the lookup is forwarded to 
it. However, it could be the case that all alive preceding fingers 
to k are dead. In that case, we need to use the successor list 
as a last resort for the lookup. Therefore, we locate the first 
alive successor y and if k G (n, y] then y is the successor of 
k. Otherwise, we locate the closest alive preceding successor 
to k and forward the lookup to it. 

C. Failures 

Throughout the code we use the call isAlive and 
isNotAlive. A simple interpretation of those routines would 
be to equate them to a performance of a ping. However, a 
correct implementation for them is that they are discovered 
by performing the operation required. For instance, a call to 
f irst Alive Succes or in Fig.[TJis performed to retrieve a node 
y and then call y.iThinklamYourPred, so alternatively the 
first alive successor could be discovered by iterating on the 
successor list and calling iThinklamYourPred. 

IV. The Analysis 
A. Distributional Properties of Inter-Node Distances 

In this section we will assume that all keys are populated 
by peers with independent and equal probability, and, further- 
more, that this probability does not change with time. The 
first condition is a natural consequence of peers joining and 
leaving/failing independently. The last condition, on the other 
hand, does not hold strictly since the number of peers present 
under churn is a fluctuating quantity, Nevertheless, it can be 
expected to hold to good accuracy in sufficiently large systems. 
A detailed analysis along these lines will be given elsewhere. 

Definition 4.1 : Given two keys u,v £ {0.../C — 1}, the 
"distance" between them is u — v (with modulo-/C arithmetic). 
We interchangeably say that u and v form an "interval" of 
length u — v. Hence the number of keys inside an interval of 
length I is I — 1 keys. 

Property 4.1: The probability P(x) of finding an interval 
of length x is: P{x) = p x ~ x {l — p) where p = . 
Under the stated conditions, each key will be populated with 
the same probability ^ = 1 — p, for N « K. An interval 




Fig. 4 

(A) Case when n and p have the same value of fin k .node. (b) 
Case where a newly joined node p copies the k th entry of its 

SUCCESSOR NODE n AS THE BEST APPROXIMATION FOR ITS OWN k th 
ENTRY (BY THE JOIN PROTOCOL). IN THIS CASE, THERE COULD BE A 
NODE O WHICH IS THE 'CORRECT' ENTRY VORp.fin k .node. HOWEVER, 
SINCE p IS NEWLY JOINED, THE ONLY INFORMATION IT HAS ACCESS TO IS 
THE FINGER TABLE OF n. 



of length x then involves x—1 consecutive unpopulated keys, 
and then one populated key, which explains the formula. 

We now derive some properties of this distribution which 
will be used in the ensuing analysis. 

Property 4.2: For any two keys u and v, where v = u + x, 
let hi be the probability that the first node encountered in 
between these two keys is at u + i (where < i < x). Then 
hi = p l (l — p). The probability that there is definitely at least 
one node between u and v is: a(x) = 1 — p x . Hence the 
conditional probability that the first node is at a distance i 
given that there is at least one node in the interval is bc(i, x) = 
b(i)/a(x). 

Property 4.3: The probability that a node and at least one 
of its immediate predecessors share the same k th finger is 
pi(fc) = — P 2 2 )- The explanation for this property 

goes as follows. If the distance between node n and its 
predecessor p is x, the distance between n.]in k .start and 
p. fink-start is also x (see Fig. E|a)). If there is no node in 
between n.fink-start and p.fink-start then n. fink.node and 
p. fink.node will share the same value. From Property 14.11 
the probability that the distance between n and p is x is 
p x ~ l {l — p). However, x has to be less than 2 fc ~ 1 , otherwise 
p. fink-node will be equal to n. The probability that no 
node exists between n.fink-start and p.fink.start is p x (by 
Property \A.2\ . Therefore the probability that the n. fink.node 
and p. fink.node share the same value is: 2~2 x =i 1 P x_1 (l 
/;;/;' = — P 2 2 )- It is straightforward (though tedious) 

to derive similar expressions for P2(k) the probability that a 
node and at least two of its immediate predecessors share the 
same k th finger, ps(k) and so on. 

Property 4.4: We can similarly assess the probability that 
the join protocol (see Section IIII-Bb results in further replica- 
tion of the k th pointer. Let us define the probability Pj i n (h k) 
as the probability that a newly joined node, chooses the i th 
entry of its successor's finger table for its own k th entry. Note 
that this is unambiguous even in the case that the successor's 
i th entry is repeated. All we are asking is, when is the k th entry 
of the new joinee the same as the i th entry of the successor? 
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At time t 



At time t + At 



Before A Join 



Before a Failure 



Before a Stabilization 



After a Join 



W (t+At) 



After a Failure 



After a Stabilization 



W (t+At) 



+ 1 
-1 


+ 1-1=0 



fV/t+At) 



^ Failed or outdated s f pointer 

*~ Correct pointer 

• Alive node 

Failed or outdated node 

Fig. 5 

Changes in W\ , the number of wrong (failed or outdated) si 

POINTERS, DUE TO JOINS, FAILURES AND STABILIZATIONS. 



Clearly i < k. In fact for the larger fingers, we only need to 
consider Pj in(k, k), since pj i n {i, k) ~ for i < k. Using the 
interval distribution we find, for large k, p }0 m{k, k) ~ p(l — 



2 ) + {l- P ){l-p 2 



k-2 



■2V 



This function goes to 1 for large k. 

We can also analogously compute Pj i n (i,k) for any i. 
The only trick here is to estimate the probability that starting 
from i, the last distinct entry of n's finger table does not 
give p a better choice for its k t h entry. This can again 
readily be computed using property 14.21 but we do not do the 
computation here since for our purposes Pj in(k, k) suffices. 

B. Successor Pointers 

We now turn to estimating various quantities of interest for 
Chord. In all that follows we will evaluate various average 
quantities, as a function of the parameters. To do this we 
need to understand how the dynamical evolution of the system 
affects these quantities. 

In the case of Chord, we only need to consider one of three 
kinds of events happening at any micro-instant: a join, a failure 
or a stabilization. One assumption made in the following is 
that such a micro-instant of time exists, or in other words, 
that we can divide time till we have an interval small enough 
that in this interval, only one of these three processes occurs 
anywhere in the system. Implicit in this is the assumption that 
a stabilization (either of successors or fingers) is done faster 
than the time-scales over which joins and fails occur. 

Another aspect of this system which simplifies analysis is 
that successor pointers of adjacent nodes are independent of 
each other. That is, the state of the first successor pointer of 
a given node does not affect the state of the first successor 
pointer of either its predecessor or its successor. The same 
logic also works for the state of the second successor pointers 
of adjacent nodes and so on. On the other hand, the state of 



TABLE I 

Gain and loss terms for W\(r, a): the number of wrong first 

SUCCESSORS AS A FUNCTION OF r AND a. 



Change 


in Wi{ 


r, a) 




Probability of Occurrence 


VKi(t4 


At) = 


W^t) 


+ 1 


ci.i = (XjNAt)(l ~ Wl ) 


Wi(t4 


At) = 


Wi(t) 


+ 1 


ci, 2 = X f N(l - W!) 2 At 


VKi(t4 


At) = 


Wi(t) 


- 1 


ci.3 = \fNw\At 


Wi(t4 


At) = 


Wi(t) 


- 1 


ci.4 = aX s NwiAt 


Wi(t4 


At) = 


Wi(t) 




1 - (ci.i 4- ci.2 4 ci.3 4 ci.4) 



the second successor pointer of a node is clearly related to the 
state of its first successor pointer as well the state of the first 
successor pointer of the successor. This is taken into account 
in the analysis of second and higher successor pointers. In 
characterizing the states of higher successors, we look for the 
leading order behavior in terms of the parameter r. 

Consider first the successor pointers. Let Wk(r,a) denote 
the fraction of nodes having a wrong k th successor pointer 
and dk (r, a) the fraction of nodes having a failed successor 
pointer. Also, let Wk (r, a) be the number of nodes having 
a wrong k th successor pointer and Dk (r, a) the number of 
nodes having a failed successor pointer. A failed pointer is 
one which points to a departed node while a wrong pointer 
points either to an incorrect node (alive but not correct) or a 
dead one. As we will see, both these quantities play a role in 
predicting lookup consistency and lookup length. 

By the protocol for stabilizing successors in Chord, a node 
periodically contacts its first successor, possibly correcting it 
and reconciling with its successor list. Therefore, the number 
of wrong k th successor pointers are not independent quantities 
but depend on the number of wrong first successor pointers. 

We write an equation for W\(r, a) by accounting for all 
the events that can change it in a micro event of time At. An 
illustration of the different cases in which changes in W\ take 
place due to joins, failures and stabilizations is provided in 
Fig. [5] In some cases W\ increases/decreases while in others it 
stays unchanged. For each increase/decrease, Table Q] provides 
the corresponding probabilities. 

By our implementation of the join protocol, a new node n y , 
joining between two nodes n x and n z , always has a correct 
Si pointer after the join. However the state of n x -Si before 
the join makes a difference. If n x .s\ was correct (pointing 
to n 2 ) before the join, then after the join it will be wrong 
and therefore W\ increases by 1. If n x .s\ was wrong before 
the join, then it will remain wrong after the join and W\ is 
unaffected. Thus, we need to account for the former case only. 
The probability that n x .s 1 is correct is 1 — W\ and term C\_x 
follows from this. 

For failures, we have 4 cases. To illustrate them we use 
nodes n x , n y , n z and assume that n y is going to fail. First, 
if both n x .si and n y .s\ were correct, then the failure of n y 
will make n x .s\ wrong and hence W\ increases by 1. Second, 
if n x .s\ and n y .s\ were both wrong, then the failure of n y 
will decrease W\ by one, since one wrong pointer disappears. 
Third, if n x .s\ was wrong and n y .s\ was correct, then Wi 
is unaffected. Fourth, if n x .si was correct and n y .s\ was 
wrong, then the wrong pointer of n y disappears and n x .s\ 
becomes wrong, therefore W\ is unaffected. For the first case 
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to happen, we need to pick two nodes with correct pointers, 
the probability of this is (1 — wi) 2 - For the second case to 
happen, we need to pick two nodes with wrong pointers, the 
probability of this is w\. From these probabilities follow the 
terms C4.2 and C1.3. 

Finally, a successor stabilization does not affect Wi, unless 
the stabilizing node had a wrong pointer. The probability of 
picking such a node is w\. From this follows the term C1.4. 

Hence the equation for W\{r, a) is: 



dWi 
Ndt 



Xj(l — wi) + A/(l — wi) 2 — ^f w i — ceX s wi 



Solving for wi in the steady state and putting Xj = A/, we 
get: 

2 2 

w i(r,a) = — ~ — (!) 

6 + ra vol 

This expression matches well with the simulation results 
as shown in Fig. [6] di(r, a) is then w |wi(r,a) since when 
Xj = Xf, about half the number of wrong pointers are incorrect 
and about half point to dead nodes. Thus e?i(r, a) f=a — which 
also matches well the simulations as shown in Fig. [6] 

The fraction of wrong second successors can be estimated 
in an analogous manner. Consider, for a node n, the possible 
states of the successor, n.8\, the successor of the successor, 
*(n.Si).si, and the second successor, n.S2- In a fully correct 
state, *(n.si).si and n.s-2 of course point to the same node. 
If in such a state either n.si or *(n.si).si becomes incorrect 
through the action of a join or a failure, then n.S2 is also 
incorrect. On the other hand, n.S2 cannot be corrected by 
the stabilization protocol unless both n.s\ and *(n.,si).si are 
both already corrected. Hence, n.S2 is wrong if either n.s\ or 
*(n.si).si are wrong, and also if both n.s\ and *(n.si).Si 
are correct, but n.s 2 has not yet been corrected. If the number 
of such non-stabilized configurations is N 2 and the fraction is 
ri2, we have 



W2 = 2wi 



n 2 



(2) 



To estimate n 2 we consider how these configurations might 
be gained or lost. The gain term arises from stabilizations of 



configurations where n.s\ is correct but *(n.Si).si is wrong. 
A stabilization performed by node n.s\ then results in the 
gain of a N2 configuration. On the other hand, non-stabilized 
configurations are lost either by a stabilization performed 
by node n (when it gets the correct successor list from its 
successor and hence corrects n.S2), or by corrupting either 
n.s\ or *(n.si).s\ (by a join or failure). The latter possibility 
gives terms of order and we can ignore it in the limit that 
stabilizations happens on a much faster time scale than joins 
and failures {i.e., r much larger than unity). The equation for 
N2 is hence 

dN 2 



dt 



aX s wi(l — wi) — aX s n2 



(3) 



U)2 



which implies 712 ~ w\ to order -. Thus, we have 

For higher successors we reason similarly by considering 
the state of the k — 1 st successor pointer of node n, the suc- 
cessor pointer of the k — 1 st successor, and the k th successor 
pointer of node n. We can write a recursion equation for Wk 
the fraction of nodes with wrong k th successor pointer 



wi + w k -i - Wk-iWx + n k 



(4) 



where rife is the density of configurations where the k — I s 
successor pointer of node n and the first successor pointer of 
the k — 1 st successor are both correct, but this information 
has not yet been used to correct the k th successor pointer of 
node n. If node n does not as yet have the correct information 
about its k th successor, that means that either all the nodes 
in between n and its k — 1 st successor have the correct 
information but node n has not as yet stabilized, or that the 
stabilization has propagated back from the k — I s * successor 
to some node in between but not as yet to n.s\. To elaborate 
on this further, there is the case where the second successor 
pointer of the k — 2 nd successor has not been corrected, then 
the case where this has been done, but the third successor 
pointer of the k — 3 rd successor has not been corrected, and 
so on. Each of these is analogous to n 2 and each occurs 
with density (1 — Wk-i)w\, if joins and failures are neglected 
compared to stabilizations. Hence, if to leading order in = we 
have Wk ~ — , then 



Cfc = c k -i + kci 



which leads to 



Wk 



k(k + l) 



(5) 



(6) 



. We note that this expression obviously depends on the details 
of the stabilization scheme, and is in principle only valid up 
to k ~ y/r. As shown in Fig. [7] the agreement between theory 
and simulation is still however quite reasonable at k = 5 and 
r = 100. 

C. Break-up (Network Disconnection) Probability 

We demonstrate below, how calculating dk(r, a): the frac- 
tion of nodes with dead k th pointers, helps in estimating the 
probability that the network gets disconnected for any value of 
r and a. Let P& u (n, r, a) be the probability that n consecutive 
nodes fail. If n = S, the length of the successor list, then 
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TABLE II 

Gain and loss terms for N bu (2, r, a): the number of nodes with 

DEAD FIRST and SECOND SUCCESSORS. 



Change 


in N bu ( 


r, a) 




Probability of Occurrence 


N bu (t- 


h At) = 


N bu (t) 


+ 1 


c 2 .i = (A / A r At)di(r, a) 


N bu (t- 


hAt) = 


N bu (t) 


+ 1 


c 2 .2 = \ f NAt(l - d x )d 2 


N bu (t- 


h At) = 


N bu (t) 


- 1 


c 2 .3 = a\ a NAtP bu (2,r,a) 


N bu (t- 


hAt) = 


N bu (t) 




1 - (C2.1 + C2.2 + C2.3) 



clearly the node whose successor list this is, gets disconnected 
from the network and the network breaks up. For the range 
of r considered in Fig. [6] Pb u (S, r, a) ~ 0. However should 
we go lower, this starts becoming finite. The master equation 
analysis introduced here can be used to estimate Pb u (n 7 r 7 a) 
for any 1 < n < S. We indicate how this might be done 
by first considering the case n = 2. Let Nb u (2,r, a) be the 
number of configurations in which a node has both si and S2 
dead and P(, u (2,r, a) be the fraction of such configurations. 
Table [U] indicates how this is estimated within the present 
framework. 

A join event does not affect this probability in any way. So 
we only need to consider the effect of failures or stabilization 
events. The term C2.1 accounts for the situation when the first 
successor of a node is dead (which happens with probability 
di(r, a) as explained above). A failure event can then kill its 
second successor as well and this happens with probability 
C2.1. The second term is the situation that the first successor 
is alive (with probability 1 — d{) but the second successor is 
dead (with probability 62)- The logic used to estimate c?2 (or 
dk in general) is very similar to the reasoning we used to 
estimate the Wfc's. So we have 



dk = di + (k — 1)g?i = kdi 



(7) 



1 st 

1 succes- 



Thus the k th successor of a node is dead if the k 
sor's successor is dead, or the k — I s * successor's successor 
is not dead but the intermediate nodes think it is because they 
haven't stabilized. Hence di ~ 2/ar. This estimate for d^ 
matches the simulation results very well, as shown in Fig. [8] 
Coming back to counting the gain and loss terms for 
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Nb u (2, r, a), a stabilization event reduces the number of such 
configurations by one, if the node doing the stabilization had 
such a configuration to begin with. 

Solving the equation for N) 3U (2 1 r, a), one hence obtains 
that Pbu(2, r, a) ~ 3/(ar) 2 . As Fig. |9]shows, this is a precise 
estimate. 

We can similarly estimate the probabilities for three con- 
secutive nodes failing, etc, and hence also the general discon- 
nection probability Pb u (S, r, a). In fact Pb u (S,r, a) may be 
written in terms of the dk(r,a) as: 



P bu {S) = (S-l)\ 



(ar) 



s-i 



(8) 



The logic behind this equation is similar to that used for 
solving for Pi, u (2), namely that for S consecutive nodes to 
fail, any S — 1 of the S nodes should have failed first, and 
then a failure event kills the remaining node. (0 is readily 
solved by substituting the values of the d^'s to get 



Pbu(S) 



QS + 1)! 

2{ar) s 



(9) 



As mentioned above this is again correct only to leading 
order. Namely there will be correction terms of the order r s+1 
which we haven't computed at this level of approximation. 
The Master Equation formalism thus affords the possibility 
of making a precise prediction for when the system runs the 
danger of getting disconnected, as a function of the parameters. 

Lookup Consistency By the lookup protocol, a lookup is 
inconsistent if the immediate predecessor of the sought key 
has a wrong s\ pointer. However, we need only consider the 
case when the sj pointer is pointing to an alive (but incorrect) 
node since our implementation of the protocol always requires 
the lookup to return an alive node as an answer to the query. 
The probability that a lookup is inconsistent I(r, a) is hence 
u>i(r, a) — di(r,a). This prediction matches the simulation 
results very well, as shown in Fig. [10] 
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D. Failure of Fingers 

We now turn to estimating the fraction of finger pointers 
which point to failed nodes. As we will see this is an 
important quantity for predicting lookups, since failed fingers 
cause timeouts and increase the lookup length. However, we 
only need to consider fingers pointing to dead nodes. Unlike 
members of the successor list, alive fingers even if outdated, 
always bring a query closer to the destination and do not 
affect consistency or substantially even the lookup length. 
Therefore we consider fingers in only two states, alive or dead 
(failed). By our implementation of the stabilization protocol 
(see Sections IIII-AI and IHI-Bl i. fingers and successors are 
stabilized entirely independently of each other to simplify the 
analysis. Thus even though the first finger is also always the 
first successor, this information is not used by the node in 
updating the finger. Fingers of nodes far apart are independent 
of each other. Fingers of adjacent nodes can be correlated and 
we take this into account. The only assumption in this section 
is in connection with the join protocol as explained below. 
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TABLE III 

The relevant gain and loss terms for F k , the number of nodes 

WHOSE kth FINGERS ARE POINTING TO A FAILED NODE FOR k > 1. 



F k (t + At) 


Probability of Occurence 


= F k (t) + l 


C3.1 


= (XjNAt) E;-lftom(*,fc)/i 


= F k (t)-l 


C3.2 


= (l-a)j7fk(X a NAt) 

= (l-/fc) 2 [l-pi(fc)](A//VA<) 


= F k (t) + l 


C3.3 


= F k (t) + 2 


C3.4 


= (1 - fk) 2 {p±{k) - p 2 (k)){\ s NM) 


= F k (t)+3 


C3.5 


= (l-/fc) 2 (P2(fc)-P3(fc))(A / 7VAi) 


= F k (t) 


1 - 


(C3.1 + C3.2 + C3.3 + C3.4 + C3.5) 



Let fk (r, a) denote the fraction of nodes whose k th finger 
points to a failed node and Fk (r, a) denote the respective 
number. For notational simplicity, we write these as simply 
Fk and fk- We can predict this function for any k by again 
estimating the gain and loss terms for this quantity, caused by 
a join, failure or stabilization event, and keeping only the most 
relevant terms. These are listed in Table [III] and illustrated in 

Fig. HH 

A join event can play a role here by increasing the number 
of Fk pointers if the successor of the joinee had a failed i th 
pointer (occurs with probability /j) and the joinee replicated 
this from the successor as the joinee's k th pointer, (occurs with 
probability Pj in{h from property 14.41 ). For large enough k, 
this probability is one only for pjoin(k,k), that is, the new 
joinee mostly only replicates the successor's kth pointer as its 
own k th pointer. This is what we consider here. 

A stabilization evicts a failed pointer if there was one to 
begin with. The stabilization rate is divided by Ai, since a 
node stabilizes any one finger randomly, every time it decides 
to stabilize a finger at rate (1 — a)\ s . 

Given a node n with an alive k th finger (occurs with 
probability 1 — fk), when the node pointed to by that finger 
fails, the number of failed k th fingers (Fk) increases. The 
amount of this increase depends on the number of immediate 
predecessors of n that were pointing to the failed node with 
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their k th finger. That number of predecessors could be 0, 1, 
2,.. etc. Using property |4. 3 1 the respective probabilities of those 
cases are: 1 — Pi(k), pi{k) — P2(k), P2(k) — ps(k),... etc. 
Solving for in the steady state, we get: 



2Prep{k) + 2 — Pjoin(k) + 



r(l — a) 



2(1 + P rep (k)) 



(fc) + 2 

Pjoin (k) + 



r(l-q) 
M 



4(1+P rep (fc)) 2 



2(1 + P rep (fc)) 



(10) 



rep(k) = T,pi(k). In practice, it is enough to keep the 



where P 7 

first three terms in this sum. To first order in i we have, in 
analogy to ((6]l 



fk 



(l + P rep (k))M 
(1 — a)r 



(11) 



This expression simply says that the fraction of dead fingers 
is inversely proportional to the rate of finger stabilizations, 
(1 — a)r, and proportional to how many fingers there are to 
stabilize, M., with the proportionality factor (1 + P rep (k)) 
depending only on p. 

To sum up, the computation of the fraction of dead k th 
finger pointers is analogous to the calculation of the fraction 
of wrong first successor pointer, albeit a bit more involved. 
No recursion is involved, in contrast to the calculation of 
the fraction of wrong higher successor pointers. The above 
expressions, ( TTOb match very well with the simulation results 
(Fig. 23). 

E. Cost of Finger Stabilizations and Lookups 

In this section, we demonstrate how the information about 
the failed fingers and successors can be used to predict the cost 
of stabilizations, lookups or in general the cost for reaching 
any key in the id space. By cost we mean the number of 
hops needed to reach the destination including the number of 
timeouts encountered en-route. Timeouts occur every time a 
query is passed to a dead node. The node does not answer and 
the originator of the query has to use another finger instead. 
For this analysis, we consider timeouts and hops to add equally 
to the cost. We can easily generalize this analysis to investigate 
the case when a timeout costs some factor 7 times the cost of 
a hop. 

Define Ct(r,a) (also denoted by C\) to be the expected 
cost for a given node to reach some target key which is t keys 
away from it (which means reaching the first successor of 
this key). For example, C\ would then be the cost of looking 
up the adjacent key (1 key away). Since the adjacent key is 
always stored at the first alive successor, therefore if the first 
successor is alive (which occurs with probability 1 — d\), the 
cost will be 1 hop. If the first successor is dead but the second 
is alive (occurs with probability e?i(l — d 2 )), the cost will be 
1 hop + 1 timeout = 2 and the expected cost is 2 x di(l — d 2 ) 
and so forth. Therefore, we have C\ = 1 — d\ + 2 x di(l — 
d 2 ) + 3x did 2 (l - d 3 ) + ■ ■ ■ « 1 + di = 1 + l/(or). 



To find the expected cost for reaching a general distance t 
we need to closely follow the Chord protocol, which would 
lookup t by first finding the closest preceding finger. For the 
purposes of the analysis, we will find it easier to think in terms 
of the closest preceding start. Let us hence define £ to be the 
start of the finger (say the k th ) that most closely precedes 
t. Hence £ = 2 fc ~ 1 + n and t = £ + m i.e., there are m 
keys between the sought target t and the start of the closest 
preceding finger. With that, we can write a recursion relation 
for Cf_|_ m as follows: 



Q +m = C 5 [1 - o(m)] 



+ (1 - f k )a(m) 



m — l 



1 + bc(i, rnjCr, 



i=0 



+ A-a(m) 



?/2'-l 



fe-1 



bc(l, + (»-!) + C 6i -i +m ) + 0(h k (kj) 



1=0 



(12) 



where ^ = J2 m =i an d h k (i) is the probability that 

a node is forced to use its k — i th finger owing to the death 
of its k th finger. The probabilities a, b, be have already been 
introduced in Section HV1 and we define the probability hk(i) 
below. 

The lookup equation though rather complicated at first sight 
merely accounts for all the possibilities that a Chord lookup 
will encounter, and deals with them exactly as the protocol 
dictates. 

The first term (Fig. Q~2](a)) accounts for the eventuality that 
there is no node intervening between £ and £ + m (occurs 
with probability 1 — a(m)). In this case, the cost of looking 
for £ + m is the same as the cost for looking for £. 

The second term (Fig. Q~2] (b)) accounts for the situation 
when a node does intervene in between (with probability 
a(m)), and this node is alive (with probability 1 — fk). Then 
the query is passed on to this node (with 1 added to register 
the increase in the number of hops) and then the cost depends 
on the length of the distance between this node and t. 

The third term (Fig. Q~2] (c)) accounts for the case when the 
intervening node is dead (with probability fk). Then the cost 
increases by 1 (for a timeout) and the query needs to find an 
alternative lower finger that most closely precedes the target. 
Let the k — i th finger (for some i, 1 < i < k — 1) be such a 
finger. This happens with probability hk(i) i.e., the probability 



that the lookup is passed back to the k 



:th 



finger either 



because the intervening fingers are dead or share the same 
finger table entry as the k th finger is denoted by hk(i). The 
start of the k — i th finger is at £/2* and the distance between 
£/2* and £ is equal to ^ m=1 1 £/2 m which we denote by £j. 
Therefore, the distance from the start of the k—i th to the target 
is equal to & + m. However, note that fink-i-node could be 
I keys away (with probability bc{l , £ / 1 2 % )) from fink-i-start 
(for some I, < I < £/2 4 ). Therefore, after making one hop 
to fink-i-node, the remaining distance to the target is + 
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Cases that a lookup can encounter with the respective probabilities and costs. 
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m — I. The increase in cost for this operation is 1 + (i — 
1); the 1 indicates the cost of taking up the query again by 
fink-i-node, and the i — 1 indicates the cost for trying and 
discarding each of the i— 1 intervening fingers. The probability 
hk(i) is easy to compute given property |4. 2 1 and the expression 
for the /j.'s computed in the previous section. 

h k (i) =a(£/2 i )(l-f k _ i ) 

xn, = i,i_i(l - a(£/2 s ) + a (£/2 s )/ fc _ s ), i < k (13) 

h k (k) =n,=i, fc _i(l - a(£/2 s ) + a(£/2 s )f k ^ s ) 

In ( fT3l we account for all the reasons that a node may 
have to use its k — i th finger instead of its k th finger. This 
could happen because the intervening fingers were either dead 
or not distinct. The probabilities h k (i) satisfy the constraint 
Xa=i h k {i) = 1 since clearly, either a node uses any one of 
its fingers or it doesn't. This latter probability is h k {k), that is 
the probability that a node cannot use any earlier entry in its 
finger table. In this case, n proceeds to its successor list. The 
query is now passed on to the first alive successor and the new 
cost is a function of the distance of this node from the target t. 
We indicate this case by the last term in[T2]which is 0(h k (k)). 



This can again be computed from the inter-node distribution 
and from the functions d k (r, a) computed earlier. However in 
practice, the probability for this is extremely small except for 
targets very close to n. Hence this does not significantly affect 
the value of general lookups and we ignore it in our analysis. 
The cost for general lookups is hence 



The lookup equation is solved recursively numerically, given 
the coefficients and C\. In Fig. Qj] we compare theoretical 
results with simulation for TV = 1000. It is seen that the theory 
matches the simulation results very well. 

In Fig. [14] we also show the theoretical predictions for 
some larger values of N. From the structure of Equation 
[T2l it is clear that the dependence of the average lookup 
on churn comes entirely from the presence of the terms f k . 
Since f k ~ / is independent of k for large fingers, we can 
approximate the average lookup length by the functional form 
L(r, a) = A + Bf + C f 2 + ■ ■ ■ . The coefficients A, B, C etc 
can be recursively computed by solving the lookup equation to 
the required order in / and depend only on TV the number of 
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Lookup cost, theoretical curve, for 1000,2000,4000,8000 and 
16000 peers. The rationale for the fits is explained in the text. 



nodes, 1 — p the density of peers and b the base or equivalently 
the size of the finger table of each node. The advantage of 
writing the lookup length this way is that churn-specific details 
such as how new joinees construct a finger table or how 
exactly stabilizations are done in the system, can be isolated 
in the expression for /. If we were to change our stabilization 
strategy for example [9], we could immediately estimate the 
lookup length by plugging in the new expression for / in the 
above relation. 

The coefficient A, which is the lookup cost without churn 
can be obtained very precisely for any base b, from analyzing 
(TTZt in the zero-churn case. This analysis is rather laborious 
and will be presented elsewhere [9]. It confirms the well- 
known result A=\ log 2 N and in addition reproduces small 
deviations from this behavior previously observed by us in 
numerical simulations [7]. The values of A in Fig. [14] are 
taken from this analysis. 

B can be qualitatively estimated as follows : every suf- 
ficiently long finger is dead with some finite probability / 
given by dTOb . If A is the average value of the lookup 
length without churn, then each look-up encounters fA dead 
fingers on average. This estimate predicts a look-up cost of 
approximately A(l + /), giving B = A and C and all other 
coefficients equal to 0.. 

In Fig. [l4]we show that the best fit to the data is obtained 
in fact by taking B = A and C = 3A. The expression 
for / is taken from [10] for large k (for a system with 20 
fingers, the expression for becomes independent of k for 
k > 13). In general, as mentioned earlier, B and C can be 
obtained accurately for any value of the system parameters by 
the numerical solution of Eq. [T2] to the required order. 

V. Discussion and Conclusion 

In this paper we have presented a detailed theoretical 
analysis of a DHT-based P2P system, Chord, using a fluid 
model. The technique for deriving the fluid model has been 
borrowed from the master equation approach of physics, which 
helps in systematically taking different dynamical effects into 
account. This analysis differs from previous theoretical work 



done on DHTs in that it aims not at establishing bounds, 
but on precise determination of the relevant quantities in this 
dynamically evolving system. From the match of our theory 
and the simulations, it can be seen that we can predict with 
an accuracy of greater than 1% in most cases. Though this 
analysis is not exact, since it takes only some (and not all) 
correlations into account, yet it provides a methodology for 
keeping track of most of the relevant details of the system. 
We expect that a similar analysis can be done for most other 
DHT's, thus helping to establish quantitative guidelines for 
their comparison. 

The main conclusions for the analysis of Chord in a 
statistically steady state are the following. 

Property 5.1: As a function of r, the ratio of the rate of 
stabilizations to the rate of failures, the fraction of wrong 
pointers of any kind (successors or fingers) is to leading order 
and good approximation Const, jr, where the constant depends 
on the pointer. 

Property 5.2: The probability of break up of a ring can be 
estimated from the knowledge of the fraction of wrong first 
successors, wrong second successors, etc. This probability is 
generally very low when every node has a sufficient number of 
successors, indicating that Chord is robust against ring break- 
up. 

Property 5.3: At a given value of r, the fraction of wrong 
successors, Wk, and the fraction of dead fingers, increases 
with k. The fraction of wrong successors increases indefinitely, 
and becomes of order one at k about ^/r for the particular 
stabilization strategy that we have used. The fraction of dead 
fingers on the other hand tends to a constant for sufficiently 
large k. 

Property 5.4: The look-up cost, which is the expected num- 
ber of hops including time-outs, can be computed by numerical 
recursion. The fraction of incorrect finger pointers (~ / 
for large k) is a required input for this recursion. The lookup 
cost tends to the well-known average number of hops without 
churn when / is small (or churn is low) and increases when / 
is large. We show that it can be well described by the formula 
A(l +g(f)), where A is the value of the lookup cost without 
churn and g{f) is well approximated by / + 3/ 2 for N << K. 
In general g(f) can be obtained accurately to any desired order 
by solving Eq. Q~2] recursively to the required order in /. 

Property 5.5: The preceding note brings out the following 
simple feature of Chord: under any state of churn, sufficiently 
long fingers are all dead with essentially the same probability. 
Hence, in a sufficiently large system, a look-up will almost 
surely encounter one or more dead fingers, leading to time- 
outs. For applications where time-outs should be the exception 
and not the norm, this paper helps in estimating how much 
stabilization is necessary under a given level of churn, to 
achieve such a level of performance. 

Property 5.6: The preceding note also brings out the ad- 
ditional feature that by writing the lookup cost in the above 
simplified form, we can isolate the effects of churn-specific 
details in the expression for /. Changing details in the join 
protocol or changing the maintenance strategy [9] merely 
cause a change in the expression for /. The lookup cost with 
this new strategy can then be immediately assessed for any r, 
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by plugging in the new expression for / in the expression for 
the lookup cost (as opposed to solving Eq. [12] each time for 
each value of r). 

The impact of this work can be summarized as follows: 
given that periodic stabilization is a fundamental technique 
for topology maintenance in DHTs, the question: "How often 
should a DHT node perform periodic stabilization?" is of great 
practical relevance. The answer to this question depends on 
several factors. First we need to know where the DHT is 
deployed, in a LAN, in a cooperative milieu, or among public 
non-trusting partners, i.e., what is the expected join/failure rate 
(churn)? Secondly, since DHTs involve different types of stabi- 
lizations, we need to know which of these rates is of interest to 
optimize. For example, in the DHT studied in this paper, there 
is both ring stabilization as well as finger stabilization. Thirdly, 
we also need to know whether we have performance goals 
which require us to know how much stabilization is needed, 
or constraints on bandwidth which necessitate a knowledge 
of the expected performance. Previous analytical attempts 
(see Section [TTJ have addressed these question through the 
identification of general (algorithm/system-neutral) bounds on 
stabilization rates. 

In this paper, we have taken another point of view. We have 
traded-off generality for accuracy. That is, we have produced 
results that can describe to a very high degree of accuracy 
quantities like the probability of inconsistent look-ups and the 
expected look-up length as functions of the stabilization and 
churn rates. Many of the insights we get from this analysis 
such as most of the points listed above, would be very hard to 
come by from simulations alone. So for instance, the formulae 
produced in this paper could directly be used by a system 
administrator or the person in charge of deploying a DHT as 
a guide for configuring stabilization rates. While the results 
are based on Chord, all analyses concerning the ring (break- 
up and inconsistency) are applicable to many other systems, 
since consistent hashing on a ring is a recurring component in 
many other DHTs. 

VI. Limitations and Future Work 

The main limitation of this work stems from the fact that the 
results are inherently dependent on the intricate details of the 
analyzed algorithms. While some changes in the algorithms 
can be easily accommodated without redoing the analysis (as 
explained in 15. 6t , others such as a different lookup strategy or 
a different placement of fingers would necessitate recalculating 
all the quantities again. However, results concerning the ring- 
related aspects like successor lists, break-up probability and 
inter-node distributions are likely to be reusable in other 
variations of the Chord protocols as well other systems using 
a ring geometry. 

For the future, the authors' research agenda include the 
introduction of extensions to the current model to be able to 
account for locality-awareness and different topology main- 
tenance techniques. Some work towards the latter goal has 
already been done in [9]. Relatedly, a useful application for 
this work is to enable systems to dynamically self-tune their 
stabilization rates and choose the best maintenance technique 
to achieve a desired hop count. 
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